Associative Clustering by Maximizing a Bayes Factor
نویسندگان
چکیده
Clustering by maximizing the dependency between (margin) groupings or partitionings of co-occurring data pairs is studied. We suggest a probabilistic criterion that generalizes discriminative clustering (DC), an extension of the information bottleneck (IB) principle to labeled continuous data. The criterion is the Bayes factor between models assuming dependence and independence of the two cluster sets, and it can be used as a well-founded criterion for IB for small data sets. With suitable prior assumptions the Bayes factor is equivalent to the hypergeometric probability of a contingency table with the optimized clusters at the margins, and for large data it becomes the standard mutual information. An algorithm for two-margin clustering of paired continuous data, associative clustering (AC), is introduced. Genes are clustered to find dependencies between gene expression and transcription factor binding, and dependencies between expression in different organisms.
منابع مشابه
Associative Clustering (AC): Technical Details
This report contains derivations which did not fit into the paper [3]. Associative clustering (AC) is a method for separately clustering two data sets when one-to-one associations between the sets, implying statistical dependency, are available. AC finds Voronoi partitionings that maximize the visibility of the dependency on the cluster level. The main content of this paper are technical result...
متن کاملA Comparative Study of Issues in Big Data Clustering Algorithm with Constraint Based Genetic Algorithm for Associative Clustering
Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups, called clusters. The growing need for distributed clustering algorithms is attributed to the huge size of databases that is common nowadays. The task of extracting knowledge from large databases, in the form of clustering rules, has attracted considerable attention. Distri...
متن کاملIntegration of Transcription Factor Binding and Gene Expression by Associative Clustering
We integrate paired genomic data sets to reveal their dependencies. We suggest using a dependency-maximizing clustering method for the task. The recently introduced method associative clustering (AC) finds groupings of genes for which the two data sources are maximally dependent. The dependencies between data sources become represented as a contingency table, which is optimized to reveal the as...
متن کاملUsing Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...
متن کاملUsing Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003